Copy detection in Chinese documents using the Ferret
نویسندگان
چکیده
The Ferret copy detector has been used for some years on English texts to find plagiarism in large collections of students’ coursework. This article reports on extending its application to Chinese. Corpora of coursework from two Chinese universities have been collected, and our experiments show that the Ferret can find both artificially constructed plagiarism and also actually occurring, previously undetected plagiarism. We discuss issues of representation, focus on the effectiveness of a sub-symbolic approach, and show that the Ferret does not need to find word boundaries.
منابع مشابه
Copy detection in Chinese documents using the Ferret: a report on experiments
The Ferret copy detector has been used for some years on English texts to find plagiarism in large collections of students’ coursework. This article reports on extending its application to Chinese, which differs from English in many respects: the sequence of characters that make up a Chinese text do not have word boundaries marked, there is a vast Chinese “alphabet”, or number of different char...
متن کاملContent-based Plagiarism Detection in Korean Document Using Ferret’s Trigram
Document plagiarism means the unauthorized use of the original document of another author without recognition of the source. With the development of the Internet, the volume of digital information available and easily accessible has increased massively and detecting plagiarism manually is so expensive in terms of both time and effort. Although many copy detection techniques for digital document...
متن کاملDetection of Copy-Move Forgery in Digital Images Using Scale Invariant Feature Transform Algorithm and the Spearman Relationship
Increased popularity of digital media and image editing software has led to the spread of multimedia content forgery for various purposes. Undoubtedly, law and forensic medicine experts require trustworthy and non-forged images to enforce rights. Copy-move forgery is the most common type of manipulation of digital images. Copy-move forgery is used to hide an area of the image or to repeat a por...
متن کاملPerformance evaluation of block-based copy- move image forgery detection algorithms
Copy-move forgery is a particular type of distortion where a part or portions of one image is/are copied to other parts of the same image. This type of manipulation is done to hide a particular part of the image or to copy one or more objects into the same image. There are several methods for detecting copy-move forgery, including block-based and key point-based methods. In this paper, a method...
متن کاملDETECTING SIMILAR HTML DOCUMENTS USING A SENTENCE-BASED COPY DETECTION APPROACH by
DETECTING SIMILAR HTML DOCUMENTS USING A SENTENCE-BASED COPY DETECTION APPROACH Rajiv Yerra Department of Computer Science Master of Science Web documents that are either partially or completely duplicated in content are easily found on the Internet these days. Not only these documents create redundant information on the Web, which take longer to filter unique information and cause additional s...
متن کامل